63 research outputs found
In-Domain Self-Supervised Learning Can Lead to Improvements in Remote Sensing Image Classification
Self-supervised learning (SSL) has emerged as a promising approach for remote
sensing image classification due to its ability to leverage large amounts of
unlabeled data. In contrast to traditional supervised learning, SSL aims to
learn representations of data without the need for explicit labels. This is
achieved by formulating auxiliary tasks that can be used to create
pseudo-labels for the unlabeled data and learn pre-trained models. The
pre-trained models can then be fine-tuned on downstream tasks such as remote
sensing image scene classification. The paper analyzes the effectiveness of SSL
pre-training using Million AID - a large unlabeled remote sensing dataset on
various remote sensing image scene classification datasets as downstream tasks.
More specifically, we evaluate the effectiveness of SSL pre-training using the
iBOT framework coupled with Vision transformers (ViT) in contrast to supervised
pre-training of ViT using the ImageNet dataset. The comprehensive experimental
work across 14 datasets with diverse properties reveals that in-domain SSL
leads to improved predictive performance of models compared to the supervised
counterparts
Semi-supervised Predictive Clustering Trees for (Hierarchical) Multi-label Classification
Semi-supervised learning (SSL) is a common approach to learning predictive
models using not only labeled examples, but also unlabeled examples. While SSL
for the simple tasks of classification and regression has received a lot of
attention from the research community, this is not properly investigated for
complex prediction tasks with structurally dependent variables. This is the
case of multi-label classification and hierarchical multi-label classification
tasks, which may require additional information, possibly coming from the
underlying distribution in the descriptive space provided by unlabeled
examples, to better face the challenging task of predicting simultaneously
multiple class labels.
In this paper, we investigate this aspect and propose a (hierarchical)
multi-label classification method based on semi-supervised learning of
predictive clustering trees. We also extend the method towards ensemble
learning and propose a method based on the random forest approach. Extensive
experimental evaluation conducted on 23 datasets shows significant advantages
of the proposed method and its extension with respect to their supervised
counterparts. Moreover, the method preserves interpretability and reduces the
time complexity of classical tree-based models
Current Trends in Deep Learning for Earth Observation: An Open-source Benchmark Arena for Image Classification
We present 'AiTLAS: Benchmark Arena' -- an open-source benchmark framework
for evaluating state-of-the-art deep learning approaches for image
classification in Earth Observation (EO). To this end, we present a
comprehensive comparative analysis of more than 400 models derived from nine
different state-of-the-art architectures, and compare them to a variety of
multi-class and multi-label classification tasks from 22 datasets with
different sizes and properties. In addition to models trained entirely on these
datasets, we also benchmark models trained in the context of transfer learning,
leveraging pre-trained model variants, as it is typically performed in
practice. All presented approaches are general and can be easily extended to
many other remote sensing image classification tasks not considered in this
study. To ensure reproducibility and facilitate better usability and further
developments, all of the experimental resources including the trained models,
model configurations and processing details of the datasets (with their
corresponding splits used for training and evaluating the models) are publicly
available on the repository: https://github.com/biasvariancelabs/aitlas-arena
Data-Driven Structuring of the Output Space Improves the Performance of Multi-Target Regressors
peer-reviewedThe task of multi-target regression (MTR) is concerned with learning predictive models
capable of predicting multiple target variables simultaneously. MTR has attracted an increasing attention
within research community in recent years, yielding a variety of methods. The methods can be divided
into two main groups: problem transformation and problem adaptation. The former transform a MTR
problem into simpler (typically single target) problems and apply known approaches, while the latter
adapt the learning methods to directly handle the multiple target variables and learn better models which
simultaneously predict all of the targets. Studies have identified the latter group of methods as having
competitive advantage over the former, probably due to the fact that it exploits the interrelations of the
multiple targets. In the related task of multi-label classification, it has been recently shown that organizing
the multiple labels into a hierarchical structure can improve predictive performance.
In this paper, we investigate whether organizing the targets into a hierarchical structure can improve the
performance for MTR problems. More precisely, we propose to structure the multiple target variables into
a hierarchy of variables, thus translating the task of MTR into a task of hierarchical multi-target regression
(HMTR). We use four data-driven methods for devising the hierarchical structure that cluster the real values
of the targets or the feature importance scores with respect to the targets. The evaluation of the proposed
methodology on 16 benchmark MTR datasets reveals that structuring the multiple target variables into a
hierarchy improves the predictive performance of the corresponding MTR models. The results also show
that data-driven methods produce hierarchies that can improve the predictive performance even more than
expert constructed hierarchies. Finally, the improvement in predictive performance is more pronounced for
the datasets with very large numbers (more than hundred) of targets.European Commissio
Using machine learning to estimate herbage production and nutrient uptake on Irish dairy farms
peer-reviewedNutrient management on grazed grasslands is of critical importance to maintain productivity levels, as grass is the cheapest feed for ruminants and underpins these meat and milk production systems. Many attempts have been made to model the relationships between controllable (crop and soil fertility management) and noncontrollable influencing factors (weather, soil drainage) and nutrient/productivity levels. However, to the best of our knowledge not much research has been performed on modeling the interconnections between the influencing factors on one hand and nutrient uptake/herbage production on the other hand, by using data-driven modeling techniques. Our paper proposes to use predictive clustering trees (PCT) learned for building models on data from dairy farms in the Republic of Ireland. The PCT models show good accuracy in estimating herbage production and nutrient uptake. They are also interpretable and are found to embody knowledge that is in accordance with existing theoretical understanding of the task at hand. Moreover, if we combine more PCT into an ensemble of PCT (random forest of PCT), we can achieve improved accuracy of the estimates. In practical terms, the number of grazings, which is related proportionally with soil drainage class, is one of the most important factors that moderates the herbage production potential and nutrient uptake. Furthermore, we found the nutrient (N, P, and K) uptake and herbage nutrient concentration to be conservative in fields that had medium yield potential (11 t of dry matter per hectare on average), whereas nutrient uptake was more variable and potentially limiting in fields that had higher and lower herbage production. Our models also show that phosphorus is the most limiting nutrient for herbage production across the fields on these Irish dairy farms, followed by nitrogen and potassium
Explainable Model-specific Algorithm Selection for Multi-Label Classification
Multi-label classification (MLC) is an ML task of predictive modeling in
which a data instance can simultaneously belong to multiple classes. MLC is
increasingly gaining interest in different application domains such as text
mining, computer vision, and bioinformatics. Several MLC algorithms have been
proposed in the literature, resulting in a meta-optimization problem that the
user needs to address: which MLC approach to select for a given dataset? To
address this algorithm selection problem, we investigate in this work the
quality of an automated approach that uses characteristics of the datasets -
so-called features - and a trained algorithm selector to choose which algorithm
to apply for a given task. For our empirical evaluation, we use a portfolio of
38 datasets. We consider eight MLC algorithms, whose quality we evaluate using
six different performance metrics. We show that our automated algorithm
selector outperforms any of the single MLC algorithms, and this is for all
evaluated performance measures. Our selection approach is explainable, a
characteristic that we exploit to investigate which meta-features have the
largest influence on the decisions made by the algorithm selector. Finally, we
also quantify the importance of the most significant meta-features for various
domains
- …